Introduction

This notebook demonstrates how BioThings Explorer can be used to answer the following query:

         "What biosamples are associated with diseases related to gene SLC15A4"

To experiment with an executable version of this notebook, .

Background: BioThings Explorer can answer two classes of queries -- "EXPLAIN" and "PREDICT". EXPLAIN queries are described in EXPLAIN_demo.ipynb, and PREDICT queries are described in PREDICT_demo.ipynb. Here, we describe PREDICT queries and how to use BioThings Explorer to execute them. A more detailed overview of the BioThings Explorer systems is provided in these slides.

In the first stage of the query, BTE will first call all APIs which can provide association data between SLC15A4 and diseases, including:

  1. DISEASES API
  2. BIOLINK API
  3. SEMMED API
  4. MyDisease.info API
  5. CTD API

In the second stage of the query, BTE will first call all APIs which can provide association data between diseases and biosamples through Stanford Biosample API.

Step 0: Load BioThings Explorer modules

Install the biothings_explorer packages, as described in this README. This only needs to be done once (but including it here for compability with ).


In [ ]:
!pip install git+https://github.com/biothings/biothings_explorer#egg=biothings_explorer

In [1]:
from biothings_explorer.user_query_dispatcher import FindConnection

from biothings_explorer.hint import Hint

Step 1: Find representation of "SLC15A4" in BTE

In this step, BioThings Explorer translates our query string "SLC15A4" into BioThings objects, which contain mappings to many common identifiers. Generally, the top result returned by the Hint module will be the correct item, but you should confirm that using the identifiers shown.

Search terms can correspond to any child of BiologicalEntity from the Biolink Model, including DiseaseOrPhenotypicFeature (e.g., "lupus"), ChemicalSubstance (e.g., "acetaminophen"), Gene (e.g., "CDK2"), BiologicalProcess (e.g., "T cell differentiation"), and Pathway (e.g., "Citric acid cycle").


In [2]:
ht = Hint()
SLC15A4 = ht.query("SLC15A4")['Gene'][0]
SLC15A4


Out[2]:
{'entrez': '121260',
 'name': 'solute carrier family 15 member 4',
 'symbol': 'SLC15A4',
 'taxonomy': 9606,
 'umls': 'C1427907',
 'uniprot': 'Q8N697',
 'hgnc': '23090',
 'ensembl': 'ENSG00000139370',
 'display': 'entrez(121260) name(solute carrier family 15 member 4) symbol(SLC15A4) taxonomy(9606) umls(C1427907) uniprot(Q8N697) hgnc(23090) ensembl(ENSG00000139370) ',
 'type': 'Gene',
 'primary': {'identifier': 'entrez', 'cls': 'Gene', 'value': '121260'}}

In this section, we find all paths in the knowledge graph that connect SLC15A4 to any entity that is a biosample. To do that, we will use FindConnection. This class is a convenient wrapper around two advanced functions for query path planning and query path execution.


In [3]:
fc = FindConnection(input_obj=SLC15A4, output_obj='Biosample', intermediate_nodes=['DiseaseOrPhenotypicFeature'])

In [4]:
fc.connect(verbose=True)


==========
========== QUERY PARAMETER SUMMARY ==========
==========

BTE will find paths that join 'SLC15A4' and 'Biosample'. Paths will have 1 intermediate node.

Intermediate node #1 will have these type constraints: DiseaseOrPhenotypicFeature




========== QUERY #1 -- fetch all DiseaseOrPhenotypicFeature entities linked to SLC15A4 ==========
==========

==== Step #1: Query path planning ====

Because SLC15A4 is of type 'Gene', BTE will query our meta-KG for APIs that can take 'Gene' as input and 'DiseaseOrPhenotypicFeature' as output

BTE found 5 apis:

API 1. semmeddisease(6 API calls)
API 2. ctd_gene2disease(1 API call)
API 3. DISEASES(1 API call)
API 4. mydisease.info(1 API call)
API 5. biolink_gene2disease(1 API call)


==== Step #2: Query path execution ====
NOTE: API requests are dispatched in parallel, so the list of APIs below is ordered by query time.

API 4.1: http://mydisease.info/v1/query (POST "q=121260&scopes=disgenet.genes_related_to_disease.gene_id&fields=mondo.xrefs.umls,disgenet.xrefs.umls&species=human&size=100")
API 1.4: http://pending.biothings.io/semmed/query (POST "q=C1427907&scopes=AFFECTS_reverse.protein.umls&fields=umls&species=human&size=100")
API 1.3: http://pending.biothings.io/semmed/query (POST "q=C1427907&scopes=AFFECTS_reverse.gene.umls&fields=umls&species=human&size=100")
API 1.1: http://pending.biothings.io/semmed/query (POST "q=C1427907&scopes=AFFECTS.gene.umls&fields=umls&species=human&size=100")
API 1.5: http://pending.biothings.io/semmed/query (POST "q=C1427907&scopes=ASSOCIATED_WITH.gene.umls&fields=umls&species=human&size=100")
API 1.6: http://pending.biothings.io/semmed/query (POST "q=C1427907&scopes=CAUSES_reverse.gene.umls&fields=umls&species=human&size=100")
API 1.2: http://pending.biothings.io/semmed/query (POST "q=C1427907&scopes=AFFECTS.protein.umls&fields=umls&species=human&size=100")
API 3.1: https://pending.biothings.io/DISEASES/query (POST "q=SLC15A4&scopes=DISEASES.associatedWith.symbol&fields=_id&species=human&size=100")
API 5.1: https://api.monarchinitiative.org/api/bioentity/gene/NCBIGene:121260/diseases?rows=100
API 2.1: http://ctdbase.org/tools/batchQuery.go?inputType=gene&inputTerms=121260&report=diseases_curated&format=json
0, message='Attempt to decode JSON with unexpected mimetype: text/plain;charset=utf-8'


==== Step #3: Output normalization ====

API 3.1 DISEASES: 5 hits
API 5.1 biolink_gene2disease: 1 hits
API 2.1 ctd_gene2disease: 1 hits
API 4.1 mydisease.info: 2 hits
API 1.1 semmeddisease: No hits
API 1.2 semmeddisease: No hits
API 1.3 semmeddisease: 1 hits
API 1.4 semmeddisease: No hits
API 1.5 semmeddisease: No hits
API 1.6 semmeddisease: 2 hits

After id-to-object translation, BTE retrieved 8 unique objects.


========== QUERY #2.1 -- fetch all Biosample entities linked to DiseaseOrPhenotypicFeature entites ==========
==========

==== Step #1: Query path planning ====

Because None is of type 'DiseaseOrPhenotypicFeature', BTE will query our meta-KG for APIs that can take 'DiseaseOrPhenotypicFeature' as input and 'Biosample' as output

BTE found 1 apis:

API 1. stanford_biosample_disease2sample(5 API calls)


==== Step #2: Query path execution ====
NOTE: API requests are dispatched in parallel, so the list of APIs below is ordered by query time.

API 1.4: http://api.kp.metadatacenter.org/biosample/search?q=biolink:Disease=DOID:0070216&limit=1000
API 1.1: http://api.kp.metadatacenter.org/biosample/search?q=biolink:Disease=MONDO:0005265&limit=1000
API 1.5: http://api.kp.metadatacenter.org/biosample/search?q=biolink:Disease=MONDO:0004670&limit=1000
API 1.2: http://api.kp.metadatacenter.org/biosample/search?q=biolink:Disease=MONDO:0005554&limit=1000
API 1.3: http://api.kp.metadatacenter.org/biosample/search?q=biolink:Disease=MONDO:0007915&limit=1000


==== Step #3: Output normalization ====

API 1.1 stanford_biosample_disease2sample: No hits
API 1.2 stanford_biosample_disease2sample: No hits
API 1.3 stanford_biosample_disease2sample: 770 hits
API 1.4 stanford_biosample_disease2sample: No hits
API 1.5 stanford_biosample_disease2sample: No hits

After id-to-object translation, BTE retrieved 770 unique objects.

==========
========== Final assembly of results ==========
==========


In the #1 query, BTE found 8 unique DiseaseOrPhenotypicFeature nodes
In the #2 query, BTE found 770 unique Biosample nodes

Step 3: Explore the results

Through BTE, we found 8 DiseasesOrPhenotypicFeature entities which are associated with Gene SLC15A. And we found 770 biosample entities which are associated with these diseases.


In [5]:
fc.display_table_view()


Out[5]:
input input_type pred1 pred1_source pred1_api pred1_pubmed node1_id node1_name node1_type pred2 pred2_source pred2_api pred2_pubmed output_id output_name output_type
0 SLC15A4 Gene associatedWith DISEASES DISEASES None None systemic lupus erythematosus (disease) DiseaseOrPhenotypicFeature diseaseAssociatedWithBiosample NCBI Biosample Database stanford_biosample_disease2sample None None SAMEA2402387 Biosample
1 SLC15A4 Gene associatedWith biolink biolink_gene2disease None systemic lupus erythematosus (disease) DiseaseOrPhenotypicFeature diseaseAssociatedWithBiosample NCBI Biosample Database stanford_biosample_disease2sample None None SAMEA2402387 Biosample
2 SLC15A4 Gene associatedWith CTD ctd_gene2disease None systemic lupus erythematosus (disease) DiseaseOrPhenotypicFeature diseaseAssociatedWithBiosample NCBI Biosample Database stanford_biosample_disease2sample None None SAMEA2402387 Biosample
3 SLC15A4 Gene associatedWith mydisease.info mydisease.info None None systemic lupus erythematosus (disease) DiseaseOrPhenotypicFeature diseaseAssociatedWithBiosample NCBI Biosample Database stanford_biosample_disease2sample None None SAMEA2402387 Biosample
4 SLC15A4 Gene associatedWith DISEASES DISEASES None None systemic lupus erythematosus (disease) DiseaseOrPhenotypicFeature diseaseAssociatedWithBiosample NCBI Biosample Database stanford_biosample_disease2sample None None SAMEA4456858 Biosample
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
3075 SLC15A4 Gene associatedWith mydisease.info mydisease.info None None systemic lupus erythematosus (disease) DiseaseOrPhenotypicFeature diseaseAssociatedWithBiosample NCBI Biosample Database stanford_biosample_disease2sample None None SAMEA104266477 Biosample
3076 SLC15A4 Gene associatedWith DISEASES DISEASES None None systemic lupus erythematosus (disease) DiseaseOrPhenotypicFeature diseaseAssociatedWithBiosample NCBI Biosample Database stanford_biosample_disease2sample None None SAMN04017728 Biosample
3077 SLC15A4 Gene associatedWith biolink biolink_gene2disease None systemic lupus erythematosus (disease) DiseaseOrPhenotypicFeature diseaseAssociatedWithBiosample NCBI Biosample Database stanford_biosample_disease2sample None None SAMN04017728 Biosample
3078 SLC15A4 Gene associatedWith CTD ctd_gene2disease None systemic lupus erythematosus (disease) DiseaseOrPhenotypicFeature diseaseAssociatedWithBiosample NCBI Biosample Database stanford_biosample_disease2sample None None SAMN04017728 Biosample
3079 SLC15A4 Gene associatedWith mydisease.info mydisease.info None None systemic lupus erythematosus (disease) DiseaseOrPhenotypicFeature diseaseAssociatedWithBiosample NCBI Biosample Database stanford_biosample_disease2sample None None SAMN04017728 Biosample

3080 rows × 16 columns